SC Homework 6

Roger Jang


Due date: 20160605 23:59:59
Note that the description of this homework may be lengthy, but the concept is pretty straightforward. If you have questions about this homework, please come to my office hour at 5pm, 5/25 (Wed), at CSIE basement.

Query by Singing/Humming

Outlines

Problem definition

Dynamic programming (DP) is a very important and useful method for sequence comparision. It has been used extensively in speech recognition, natural language processing, music analysis, and a variety of other areas. In this homework, we shall explore the use of DP for query by singing/humming (QBSH), which is an interesting paradigm of music information retrieval. (For simplicity, we shall assume our QBSH system always starts the comparison from the beginning of a song.)

In particular, you need to find the alignment between the pitch vector of human's singing and the note vector of a given melody, both in the unit of semitone or MIDI number. The correspondence between MIDI numbers and piano keyboard can be found here. Moreover,

Given a pitch vector and a note vector, your mission is to find an optimum mapping to assign each pitch point to a music note, such that the total distance is minimized. To be more specific, let's use the following notation:

The alignment path can be denoted by its coordinates, that is, $\{(0, u_0), (1, u_1), (2, u_2), ..., (m-1, u_{m-1})\}$, where $u_k, k=0, 1, \cdots, m-1$ are indices into the note vector with the following constraints:

For instance, the alignment path of the above figure is ${(0, 0),(1, 0),(2, 1),(3, 1),(4, 1),(5, 1),(6, 2),(7, 2),(8, 3),(9, 3),(10, 3),(11, 3)}$, with $m=12$ (the length of $p$) and $u_{m-1}=3$ (the last index of music note that have at least one pitch point being assigned to). In other words:

The distance between a pitch vector $p$ and note vector $q$ can then be defined as follows: $$ D(p, q)=\min_{u_0, u_1, ... u_{m-1}}\sum_{i=0}^{m-1} |p(i)-q(u_i)|, $$ subject to $u_0=0$ and $u_0 \leq u_1 \leq u_2 \cdots \leq u_{m-1}$.

Suggested approach

Based on the above description, we can solve the task using DP. The 3-step formula for DP can be described as follows:

  1. Optimum-value function $D(i,j)$ is defined as the minimum distance between $p(0:i)$ and $q(0:j)$.
  2. Recurrent equation for $D(i,j)$ is shown next, with $i>0$ and $j>0$: $$ D(i,j)=|p(i)-q(j)| +\min \left\{ \begin{matrix} D(i-1,j)\\ D(i-1, j-1)\\ \end{matrix} \right. $$ (See the local constraint in the previous figure.) Please derive the recurrent equation when $i=0$ or $j=0$ by yourself. The boundary condition is $D(0, 0)=|p(0)-q(0)|$.
  3. Answer to the original task: $$ dist(p, q) = \min_{0\leq j \leq n-1}D(m-1, j). $$

Input/output formats

Requirements & suggestions

Test set

Each test case is shown below. You can download the whole test set at this link. Note that the pitch vector is actually obtained from twinkle_twinkle_little_star.wav, except that we have performed fine tuning to take key transposition into consideration. (Don't worry, you can still do the homework if you don't know what key transposition is.) So these 48 cases corresponding to the comparison between the pitch vector and 48 songs in a QBSH system. You can see that the minimum distance of these 48 cases occurs when the MIDI file is "twinkle_twinkle_little_star.MID", indicating this is a successful retrieval based on the given pitch vector. The plot of distances to 48 MIDI files are shown next (where "小星星" is actually the song "Twinkle Twinkle Little Star"):

For each test input/output, you can see corresponding debugging info, which can be used to verify the corresponding results. More specifically,

Here is the list of each test case.

Slides

Slides for this homework (recording to be uploaded soon)

Submission guidelines

Please submit your code with GitHub as directed in the GitHub submission guide. Your directory structure (under GitHub repository) should be:
  1. hw6/*, your source code.
  2. hw6/Makefile, where the TAs can use the command ”make” on CSIE R217 linux machines to compile your code, and ”make run” to run your program.
You have to make sure your code can be compiled on CSIE R217 linux machines with your Makefile and run normally since we will test your program on CSIE R217 linux machines.